专利摘要:
USE OF MULTIPLE CHANNEL DECORRELATION FOR IMPROVED ASCENDING MULTIPLE CHANNEL MIXING. The present invention relates to a system of linear equations that is used to upwardly mix a number of N audio signals to generate a large number M of audio signals that are psychoacoustically decelrelated to each other and that can be used to improve the representation of a diffuse sound field. Linear equations are defined by a matrix that specifies a set of vectors in an M dimensional space that are substantially orthogonal to each other. Methods for deriving the system of linear equations are revealed.
公开号:BR112012018291B1
申请号:R112012018291-9
申请日:2011-01-07
公开日:2020-10-27
发明作者:David S. Mcgrath
申请人:Dolby Laboratories Licensing Corporation;
IPC主号:
专利说明:

FIELD OF THE INVENTION
[001] The present invention relates in general to signal processing for audio signals and refers more specifically to signal processing techniques that can be used to generate audio signals that represent a diffuse sound field. These signal processing techniques can be used in audio applications as an upward mix, which derives some number of output channel signals from a smaller number of input channel signals. BACKGROUND OF THE INVENTION
[002] The present invention can be used to improve the quality of audio signals obtained from upward mixing; however, the present invention can be used advantageously with essentially any application that requests one or more audio signals that represent a diffuse sound field. More particular mention is made of ascending mixing applications in the following description.
[003] A process known as ascending mixing derives some number M of audio signal channels from a smaller number N of audio signal channels. For example, audio signals for five channels designated as left (L), right (R), center (C), room on the left (LS) and room on the right (RS) can be obtained by upward mixing of the audio signals to two input channels designated here as input on the left (Li) and input on the right (Ri). An example of an ascending mixing device is the Dolby® Pro Logic® II decoder revealed in Gundry, “A New Active Matrix Decoder for Surround Sound,” 19th AES Conference, May 2001. An ascending mixer that uses this particular technology analyzes the phase and amplitude of two input signal channels to determine which sound field they represent intends to deliver directional impressions to a listener. Depending on the desired artistic effect of the incoming audio signals, the upward mixer must be able to generate output signals for five channels to provide the listener with the sensation of one or more hearing components that have apparent directions within a surrounding diffuse sound field. that has no apparent direction. The present invention is directed to generating output audio signals for one or more channels that can create, through one or more acoustic transducers, a higher quality diffuse sound field.
[004] Audio signals intended to represent a diffuse sound field should create an impression on a listener that the sound is emanating from many, if not all, directions around the listener. This effect is opposite to the well-known phenomenon of creating a ghost image or apparent direction of sound between two speakers by reproducing the same audio signal through each of those speakers. A high quality fuzzy sound field typically cannot be created by reproducing the same audio signal through multiple speakers located around a listener. The resulting sound field has a wide range of amplitude at different listening locations, usually changing in large quantities for small changes in location. It is not uncommon for certain positions within a listening area to appear soundless to one ear, but not to the other. The resulting sun field looks artificial. DESCRIPTION OF THE INVENTION
[005] It is an object of the present invention to provide audio signal processing techniques to derive two or more channels of audio signals that can be used to produce a high quality diffuse sound field through acoustic transducers such as speakers.
[006] In accordance with one aspect of the present invention, M output signals are derived from N input audio signals for presenting a diffuse sound field, where M is greater than N and is greater than two. This is done by deriving K intermediate audio signals from the N input audio signals so that each intermediate signal is psycho-acoustically decoupled from the N input audio signals and, if K is greater than one, intermediate is psychoausticically decoupled at all the other intermediate signals. The N input audio signals and the K intermediate signals are mixed to derive M audio output signals according to a system of linear equations with coefficients of a matrix that specify a set of N + K vectors in an M-dimensional space . At least K of the N + K vectors are substantially orthogonal to all other vectors in the set. The quantity K is greater than or equal to one and less than or equal to M-N.
[007] According to another aspect of the present invention, a matrix of coefficients for a system of linear equations is obtained for use in mixing N input audio signals to derive M audio output signals for presentation of a sound field diffuse. This is done by obtaining a first matrix that has coefficients that specify a set of N first vectors in an M-dimensional space; derive a set of K second vectors in M-dimensional space, with each second vector being substantially orthogonal to each first vector and, if K is greater than one, all other second vectors; obtain a second matrix that has coefficients that specify the set of K second vectors; concatenating the first matrix with the second matrix to obtain an intermediate matrix that has coefficients that specify a union of the set of N first vectors and the set of K second vectors; and preferably scale the coefficients of the intermediate matrix to obtain a signal processing matrix that has a Frobenius norm within 10% of the Frobenius norm of the first matrix, where the coefficients of the signal processing matrix are the coefficients of the signal system. linear equations.
[008] The various features of the present invention and their preferred modalities can be better understood by reference to the following discussion and the accompanying drawings in which similar reference numerals refer to similar elements in the various figures. The contents of the following discussion and of the drawings are presented as examples only and should not be understood as representing limitations on the scope of the present invention. BRIEF DESCRIPTION OF THE DRAWINGS
[009] Figure 1 is a schematic block diagram of an audio signal processing device that can incorporate aspects of the present invention.
[0010] Figure 2 is a schematic illustration of an ascending base mix matrix.
[0011] Figure 3 is a schematic illustration of an ascending base mix matrix concatenated with an ascending increase mix matrix.
[0012] Figure 4 is a schematic illustration of a signal decelerator using delay components.
[0013] Figure 5 is a schematic illustration of a signal decelerator using a subband filter with a bimodal frequency dependent change in phase and a subband filter with a frequency dependent delay.
[0014] Figure 6 is a schematic block diagram of a device that can be used to implement various aspects of the present invention. WAYS TO CARRY OUT THE INVENTION The introduction
[0015] Figure 1 is a schematic block diagram of a device 10 that can incorporate aspects of the present invention. Device 10 receives audio signals for one or more input channels from signal path 19 and generates audio signals along signal path 59 for a plurality of output channels. The small line crossing the signal path 19 as well as the small lines crossing the other signal paths indicate that these signal paths carry the signals to one or more channels. The symbols N and M immediately below the small crossed lines indicate the various signal paths that carry the signals to channels N and M, respectively. The symbols x and y immediately below some of the small crossed lines indicate the respective signal paths that carry an unspecified number of signals that is not important for the purpose of understanding the present invention.
[0016] In device 10, the input signal analyzer 20 receives audio signals for one or more input channels from signal path 19 and analyzes them to determine which portions of the input signals represent a diffuse sound field and that portions represent a sound field that is not diffuse. A diffuse sound field creates an impression on a listener that the sound is emanating from many, if not all, directions around the listener. A non-diffuse sound field creates the impression that the sound is emanating from a particular direction or a relatively narrow range of directions. The distinction between diffused and non-diffused sound fields is subjective and may not always be definitive. While this may affect the performance of practical deployments that employ aspects of the present invention, it does not affect the underlying principles of the present invention.
[0017] Portions of the incoming audio signals that are considered to represent a non-diffused sound field are passed along the signal path 28 to the non-diffused signal processor 30, which generates along the signal path 39 um set of M signals that intend to reproduce the non-diffused sound field through a plurality of acoustic transducers such as loudspeakers. An example of an ascending mixing device that performs this type of processing is a Dolby Pro Logic II decoder, mentioned above.
[0018] The portions of the incoming audio signals that are considered to represent a diffuse sound field are passed along the signal path 29 to the diffuse signal processor 40, which generates along the signal path 49 a set of M signals that intend to reproduce the diffuse sound field through a plurality of acoustic transducers such as loudspeakers. The present invention is directed to the processing carried out in the diffuse signal processor 40.
The sum component 50 combines each of the M signals from the non-diffused signal processor 30 with a respective signal from the M signals from the diffused signal processor 40 to generate an audio signal for a respective channel of the M output channels. The audio signal for each output channel is intended to trigger an acoustic transducer such as a loudspeaker.
[0020] The present invention is directed to the development and use of a system of mixing equations to generate a set of audio signals that can represent a diffuse sound field. These mixing equations can be used in the diffuse signal processor 40, for example. The remainder of this revelation assumes that the number N is greater than or equal to one, the number M is greater than or equal to three, and the number M is greater than the number N.
[0021] Device 10 is merely an example of how the present invention can be used. The present invention can be incorporated into other devices that differ in function or structure from that shown in figure 1. For example, signals that represent both diffuse and non-diffuse portions of a sound can be processed by a single component. Some implementations for a distinct fuzzy signal processor 40 are described below that mix signals according to a system of linear equations defined by a matrix. Various parts of the processes for both the fuzzy signal processor 40 and the non-fuzzy signal processor 30 can be implemented by a system of linear equations defined by a single matrix. In addition, aspects of the present invention can be incorporated into a device without also incorporating the input signal analyzer 20, the non-diffuse signal processor 30 or the sum component 50. B. First Derivation Method
[0022] The diffuse signal processor 40 generates along the path 49 a set of M signals by mixing the N channels of audio signals received from the path 29 according to a system of linear equations. To facilitate description in the following discussion, portions of the N channels of audio signals received from path 29 are referred to as intermediate input signals and the M channels of intermediate signals generated along path 49 are referred to as intermediate output signals . The mixing operation includes the use of a system of linear equations that can be represented by a matrix multiplication as shown in expression 1:

[0023] where x = column vector representing N + K signals obtained from the N intermediate input signals;
[0024] C = M x (N + K) matrix or arrangement of mixing coefficients; and
[0025] Y = column vector that represents the M intermediate output signals.
[0026] The mixing operation can be performed on signals represented in the time domain or frequency domain. The following discussion makes more specific mention of time-domain deployments.
[0027] If desired, the same system of linear mixing equations can be expressed by transposing vectors and matrix as follows: YT = XT-CT (2)
[0028] where XT = line vector representing the N + K signals obtained from the N intermediate input signals;
[0029] CT = (N + K) x M transposition of matrix C; and
[0030] YT = line vector representing the M intermediary output signals.
[0031] The following description uses notations and terminology such as rows and columns that are consistent with expression 1; however, the principles of the present invention can be derived and applied to other forms or expressions such as expression 2 or an explicit system of linear equations.
[0032] As shown in expression 1, K is greater than or equal to one and less than or equal to difference (M-N). As a result, the number of signals Xi and the number of columns in the matrix i C is between N + 1 and M.
[0033] The coefficients of matrix C can be obtained from a set of N + K vectors of unit magnitude in an M-dimensional space that are "substantially orthogonal" to each other. Vectors are considered to be substantially orthogonal to each other if their entire product is less than 35% of a product of its magnitudes. This corresponds to an angle between the vectors of about seventy degrees to about 110 degrees. Each column in matrix C can have M coefficients that correspond to the elements of one of the vectors in the set. For example, the coefficients that are in the first column of matrix C correspond to one of the vectors V in the set whose elements are denoted as (%, ..., VM) such as Ci, i = p-Vi, ..., C / W1 = P'VM, where p is a scale factor used to scale the matrix coefficients as needed. Alternatively, the coefficients in each column j of matrix C can be scaled by different scale factors pj. In many applications, the coefficients are scaled so that the Frobenius norm of the matrix is equal to or within 10% of√N. Additional aspects of scheduling are discussed below.
[0034] The set of N + K vectors can be derived in any way that may be desired. One method creates an M x MG matrix of coefficients with pseudo-random values that have a Gaussian distribution and calculates the singular value decomposition of this matrix to obtain three M x M matrices denoted here as U, S and V. The U and V matrices are both unitary matrices. Matrix C can be obtained by selecting N + K columns from any one of matrix U or matrix V and scaling the coefficients in these columns to achieve the Frobenius norm equal to or within 10% of JN. A preferred method that relaxes some requirements for orthogonality is described below.
[0035] The N + K input signals are obtained by de-correlating the N intermediate input signals with respect to each other. The type of de-correlation that is desired is referred to in the present invention as "psychoacoustic de-correlation". Psychoacoustic correlation is less rigorous than numerical correlation in which two signals can be considered psychoacoustically correlated even if they have some degree of numerical correlation with each other.
[0036] The numerical correlation of two signals can be calculated using a variety of known numerical algorithms. These algorithms yield a numerical correlation measure called a correlation coefficient that varies between a negative and a positive. A correlation coefficient with a magnitude equal to or close to one indicates that the two signals are closely related. A correlation coefficient with a magnitude equal to or close to zero indicates that the two signals are generally independent of each other.
[0037] Psychoacoustic correlation refers to the correlation properties of audio signals that exist in cross-frequency sub-bands that have a so-called critical bandwidth. The frequency resolving power of the human auditory system varies with frequency across the audio spectrum. The human ear can discern spectral components close in frequency at lower frequencies below about 500 Hz, but not so close as the frequency progresses above audibility limits. The width of this frequency resolution is referred to as a critical bandwidth and, as already explained, varies with frequency.
[0038] Two signals are said to be psychoacoustically decorrelated with respect to each other if the average numerical correlation coefficient across the critical psychoacoustic bandwidths is equal to or close to zero. Psychoacoustic correlation is achieved if the numerical correlation coefficient between the two signals is equal to or close to zero at all frequencies. Psychoacoustic correlation can also be achieved even if the numerical correlation coefficient between the two signals is not equal or close to zero at all frequencies if the numerical correlation varies so that its average across each psychoacoustic critical band is less than half numerical correlation coefficient for any frequency within that critical band.
[0039] Psychoacoustic de-correlation can be achieved with the use of delays or special types of filters, which are described below. In many deployments, N of the N + K Xi signals can be taken directly from the N intermediate input signals without using any delays or filters to achieve psychoacoustic de-correlation due to the fact that these N signals represent a diffuse and they are probably already psychoacoustically unrelated. C. Enhanced Derivation Method
[0040] If the signals generated by the fuzzy signal processor 40 are combined with other signals that represent a non-fuzzy sound field as shown in figure 1, for example, the resulting combination of signals can generate unwanted artifacts if matrix C is designated using the method described above. These artifacts can result due to the fact that the design of matrix C did not have possible interactions between the diffuse and non-diffuse portions of a sound field. As mentioned above, the distinction between diffuse and non-diffuse is not always definitive and the input signal analyzer 20 can generate signals along the path 28 that represent a diffuse sound field to some degree and can generate signals along the path 29 that represent a non-diffuse sound field to some degree. If the diffuse signal generator 40 destroys or modifies the non-diffuse character of the sound field represented by the signals on path 29, unwanted artifacts or audible distortions can occur in the sound field that is produced from the output signals generated along of path 59. For example, if the sum of the M fuzzy processed signals on path 49 with the M fuzzy processed signals on path 39 causes the cancellation of some non-fuzzy signal components, this can degrade the subjective impression that could otherwise be achieved by using the present invention.
[0041] An improvement can be achieved by drawing the C considering the non-diffuse nature of the sound field that is processed by a non-diffuse signal processor 30. This can be done primarily by identifying an E matrix that either represents or presumes it represents the encoding processing that processes M channels of audio signals to create the N channels of input audio signals received from path 19, and then derive an inverse of that matrix as discussed below.
[0042] An example of an E matrix is a 5x2 matrix that is used to downwardly mix five channels, L, C, R, LS, RS, into two channels denoted total on the left (LT) and total on the right ( RT). The signals for the LT and RT channels are an example of the input audio signals for two (N = 2) channels that are received from path 19. In this example, device 10 can be used to synthesize five (M = 5) channels of audio output signals that can create a sound field that is perceptually similar if not identical to the sound field that could have been created from the original five audio signals.
[0043] An exemplary 5 x 2 E matrix that can be used to encode LT and RT channel signals from the L, C, R, LS and RS channel signals is shown in the expression:

[0044] A pseudo-inverse M x NB matrix can usually be derived from the N x ME matrix using the known numerical techniques including those implemented in numerical software such as the “pinv” function in Matlab®, available from The Math Works®, Natick , Massachusetts, or the “PseudoInverse” function in Mathematica®, available from Wolfram Research, Champaign, Illinois. Matrix B may not be optimal if its coefficients create undesirable interference between any of the channels, or if any coefficients are imaginary or complex numbers. Matrix B can be modified to remove these unwanted characteristics. It can also be modified to achieve any desired artistic effect by changing the coefficients to emphasize the signals for selected speakers. For example, the coefficients can be changed to increase the energy in the signals intended for reproduction through the speakers for the left and right channels and to decrease the energy in the signals intended for reproduction through the speaker for the central channel. The coefficients in matrix B are scaled so that each column in the matrix represents a vector of unit magnitude in an M-dimensional space. The vectors represented by the columns of matrix B need not be substantially orthogonal to each other.
[0045] An exemplary 5 x 2 8 matrix is shown in the following expression:

[0046] This matrix can be used to generate a set of M intermediate output signals from the N intermediate input signals by the following operation: Y = B.X (5)
[0047] This operation is illustrated schematically in figure 2. A mixer 41 receives the N intermediate input signals from signal paths 29-1 and 29-2 and mixes these signals according to a system of linear equations to generate a set of M intermediate output signals along signal paths 49-1 to 49-5. The boxes inside the mixer 41 represent the multiplication or amplification of the signal by coefficients of the matrix B according to the system of linear equations.
[0048] Although matrix B can be used alone, performance is increased by using an additional M x K augmentation matrix A, where 1 <K <(M-N). Each column in matrix A represents a vector of unit magnitude in an M-dimensional space that is substantially orthogonal to the vectors represented by the N columns of matrix B. If K is greater than one, each column represents a vector that is also substantially orthogonal to the vectors represented by all other columns in matrix A.
[0049] The vectors for the columns of matrix A can be derived essentially in any way that may be desired in. The techniques mentioned above can be used. A preferred method is described below.
[0050] The coefficients in matrix A and matrix B of increase can be scaled as explained below and concatenated to produce matrix C. Scaling and concatenation can be expressed algebraically as: C = [β-B α.A} (6)
[0051] where | = horizontal concentration of the columns of matrix B and matrix A;
[0052] a = scale factor for matrix A coefficients; and
[0053] β = scale factor for matrix B coefficients.
[0054] For many applications, the scale factors a and β are chosen so that the Frobenius norm of the composite matrix C is equal to or within 10% of the Frobenius norm of the matrix B. The Frobenius norm of the matrix C can be expressed as :

[0055] in quec / j = matrix coefficient in row / and column j.
[0056] If each of the N columns in matrix B and each of the K columns in matrix A represent a vector of unit magnitude, the Frobenius norm of matrix B is equal to 4N and the Frobenius norm of matrix A is equal to √ K. For this case, it can be shown that if the Frobenius norm of the matrix C should be set equal to √N, then the values for the scale factors a and β are related to each other as shown in the following expression:

[0057] After setting the value of the scale factor β, the value for scale factor a can be calculated from the expression 7. Preferably, the scale factor βis selected so that the signals mixed by the coefficients in the matrix columns B are generated with a weight of at least 5 dB greater than the signals mixed by the coefficients in the columns of the matrix of increase matrix A. A difference in weight of at least 6 dB can be achieved by restricting the scale factors so that α <1 / 2β. Larger or smaller differences in the staggering weight for the columns of matrix B and matrix A can be used to achieve a desired acoustic balance between the audio channels.
[0058] Alternatively, the coefficients in each column of the augmentation matrix A can be scaled individually as shown in the following expression:

[0059] where A = column j of the augmentation matrix A; and
[0060] dj = the respective scale factor for column j.
[0061] For this alternative, you can choose arbitrary values for each scale factor q as long as each scale factor satisfies the restriction Oj <14 β. Preferably, the values of the coefficients q / and β are chosen to ensure that the Frobenius norm of C is approximately equal to the Frobenius norm of matrix B.
[0062] Each of the signals that are mixed according to the augmentation matrix A is processed so that they are psychoacoustically decorrelated from the N intermediate input signals and from all other signals that are mixed according to the augmentation matrix A. This is illustrated schematically in figure 3, which shows as an example two (N = 2) intermediate input signals, five (M = 5) intermediate output signals and three (K = 3) decelelated signals mixed according to the magnification matrix. A. In this example, the two intermediate input signals are mixed according to the basic inverse matrix B, represented by box 41, and are de-correlated by the de-correlator 43 to provide three de-correlated signals that are mixed according to the augment matrix A, which is represented by box 42.
[0063] Decorrelator 43 can be implanted in a variety of ways. An implantation shown in figure 4 reaches the psychoacoustic de-correlation by delaying its input signals by different amounts. Delays in the range of one to twenty milliseconds are suitable for many applications.
[0064] A part of another implementation of the decelector 43 is shown in figure 5. This part processes one of the intermediate input signals. An intermediate input signal is passed along two different signal processing paths that apply filters to their respective signals in two sub-bands of overlapping frequencies. The lower frequency path includes a phase shift filter 61 that filters its input signal on a first frequency subband according to a first pulse response and a low pass filter 62 that defines the first subband frequency. The higher frequency path includes a frequency-dependent delay 63 implanted by a filter that filters its input signal in a second frequency subband according to a second pulse response that is not the same as the first pulse response, a high-pass filter 64 that defines the second frequency subband and a delay component 65. The outputs of delay 65 and low-pass filter 62 are combined at sum node 66. The sum node 66 output is a signal that is psychoacoustically decorrelated with respect to the intermediate input signal.
[0065] The phase response of the phase change filter 61 is frequency dependent and has a bimodal frequency distribution with peaks substantially equal to ninety positive and negative degrees. An ideal implantation of the phase change filter 61 has a unit magnitude response and a phase response that alternates or changes between ninety degrees positive and ninety degrees negative at the edges of two or more frequency bands within the filter passband. A phase change can be implemented by a sparse Hilbert transformation that has an impulse response shown in the following expression:

[0066] The sparse Hilbert transformation impulse response must be truncated to a selected length to optimize the performance of the de-correlator by balancing an exchange between transient performance and uniformity of the frequency response.
[0067] The number of phase changes is controlled by the value of parameter S. This parameter must be chosen to balance an exchange between the degree of de-correlation and the length of the impulse response. A longer pulse response is requested as the value of parameter S increases. If the value of parameter S is too small, the filter provides insufficient de-correlation. If the S parameter is too large, the filter will spread the transient sounds over a sufficiently long period of time to create undesirable artifacts in the de-correlated signal.
[0068] The ability to balance these characteristics can be increased by implanting the phase change filter 21 to have a non-uniform frequency spacing between adjacent phase changes, with a narrower spacing at the lower frequencies and a wider spacing at higher frequencies. Preferably, the spacing between adjacent phase changes is a logarithmic function of the frequency.
[0069] The frequency-dependent delay 63 can be implemented by a filter that has an impulse response equal to a sinusoidal sequence of finite length h [n] whose instantaneous frequency decreases monotonically from πa zero for the duration of the sequence. This sequence can be expressed as:

[0070] where ω (n) = the instantaneous frequency;
[0071] w '(/ z) = the first derivative of the instantaneous frequency;
[0072] G = normalization factor;
[0073]
= instantaneous phase; and
[0074] L = length of the delay filter.
[0075] The normalization factor G is adjusted to a value such as:

[0076] A filter with this impulse response can sometimes generate “wheezing” artifacts when applied to audio signals with transients. This effect can be reduced by adding a noise-type term to the instantaneous phase term as shown in the following expression:

[0077] If the noise term is a white Gaussian noise sequence with a variance that is a small fraction of TT, THE artifacts that are generated by filtering transients will sound more like noise rather than hiss and the desired relationship between delay and frequency is still achieved.
[0078] The cutoff frequencies of the low-pass filter 62 and the high-pass filter 64 must be chosen to be approximately 2.5 kHz if there is a gap between the passing bands of the two filters and so that the spectral energy of their combined outputs in the nearby region the crossing frequency in which the passing bands overlap is substantially equal to the spectral energy of the intermediate input signal in that region. The amount of delay imposed by delay 65 must be defined so that the propagation delay of the higher frequency and lower frequency signal processing paths are approximately equal at the crossing frequency.
[0079] The de-correlator can be implemented in different ways. For example, either or both the low-step filter 62 and the high-step filter 64 may precede the phase shift filter 61 and the frequency-dependent delay 63, respectively. The delay 65 can be implemented by one or more delay components placed in the signal processing paths as desired.
[0080] Additional implementation details can be obtained from international patent application No. PCT / US2009 / 058590 entitled “Decorrelator for Upmixing Systems” by McGrath et al., Which was filed on September 28, 2009. D. Preferential Derivation Method
[0081] A preferred modality to derive the augmentation matrix A starts by creating a "seed matrix" P. The seed matrix P contains initial estimates for the coefficients of the augmentation matrix A. The columns are selected from the matrix of seed P to form an interim matrix Q. The interim matrix Q is used to form a second interim matrix R. Columns of coefficients are extracted from the interim matrix R to obtain the augmentation matrix A. A method that can be used to create the seed matrix P is described below after describing a procedure to form the interim matrix Q, the interim matrix R and the augmentation matrix A. 1. Derivation of the augmentation matrix A
[0082] The basic inverse matrix B described above has M rows and N columns. A seed matrix P is created that has M rows and K columns, where 1 <K <(MN). Matrix B and seed matrix P are horizontally concatenated to form an interim matrix Q which has M rows and N + K columns. This concatenation can be expressed as:

[0083] The coefficients in each column j of the interim matrix Q are scaled so that they represent vectors of unit magnitude Q (j) in an M-dimensional space. This can be done by dividing the coefficients in each column by the magnitude of the vector they represent. The magnitude of each vector can be calculated from the square root of the sum of the squares of the coefficients in the column.
[0084] An interim matrix R that has the coefficients arranged in M rows and N + K columns is then obtained from the interim matrix Q. The coefficients in each column j of the interim matrix R represent a vector R (j) in a space M-dimensional. These column vectors are calculated by a process represented by the following pseudocode fragment: (1) R (1) = Q (1); (2) for j = 2 to K {(3) T (j) = (1 - RR (j-1) * TRANSP [RR (j-1)]) * Q (j); (4) if MAG [T (j)]> 0.001 {(5) R (j) = T (j) / MAG [T (j)]; (6)} other {(7) R (j) = ZERO; (8)} (9)} (10) paraj = 1 to K {(11) A (j) = R (j + N); (12)}
[0085] The statements in this pseudocode fragment have syntactic resources similar to the C programming language. This code fragment is not intended to be a practical implementation, but is only intended to help explain a process that can calculate the augmentation matrix A.
[0086] The notations RO), QO). T (í) and A (í) represent column j of the interim matrix R, the interim matrix Q, a temporary matrix T and the augmentation matrix A, respectively.
[0087] The notation RRQ-1) represents a submatrix of the matrix R with M rows and j-1 columns. This submatrix comprises columns 1 to j-1 of the interim matrix R.
[0088] The notation TRANSP [RR (j-1)] represents a function that returns the transposition of the RR matrix (j-1). The notation MAG [T (j)] represents a function that returns the magnitude of the column vector T (j), which is the Euclidean norm of the coefficients in column j of the temporary matrix T.
[0089] Referring to the pseudocode fragment, statement (1) starts the first column of matrix R from the first column of matrix Q. Statements (2) to (9) implement a cycle that calculates columns 2 to K of matrix R.
[0090] Statement (3) calculates column j of the temporary matrix T from the submatrix RR and the interim matrix Q. As explained above, the submatrix RR (j-1) comprises the first columns j-1 of the interim matrix R Statement (4) determines whether the magnitude of the column vector T (j) is greater than 0.001. If it is larger, then statement (5) defines the vector RO) equal to the vector T (j) after it has been scaled to have a unit magnitude. If the magnitude of the column vector TO) is not greater than 0.001, then the vector RO) is defined equal to a ZERO vector with all elements equal to zero.
[0091] Statements (10) to (12) implement a cycle that obtains the augmentation matrix A M x K from the last K columns of the interim matrix R, which are the columns N + 1 to N + K. The vector columns in the augmentation matrix A are substantially orthogonal to each other as are the vector columns of the basic matrix B.
[0092] If statement (4) determines that the magnitude of any column vector T (j) is not greater than 0.001, this indicates that the vector T (j) is not sufficiently linearly independent of the column vectors Q (1) a Q (j-1) θ the corresponding column vector R (j) is defined equal to the ZERO vector. If any of the vector columns R (j) for N <j N + K is equal to the vector ZERO, then the corresponding column P (j) of the seed matrix is not linearly corresponding to its previous columns. This last situation is corrected by obtaining a new column P (j) for the seed matrix P and carrying out the process again to derive another augmentation matrix A. a) Selection of the seed matrix P
[0093] The P M x K seed matrix can be created in a variety of ways. Two forms are described in the following paragraphs.
[0094] The first can create the seed matrix by generating an M x K array of coefficients that have pseudo-random values.
[0095] A second way can generate a seed matrix with coefficients that count for symmetries in the anticipated location of the acoustic transducers that will be used to reproduce the sound field represented by the intermediate output signals. This can be done by temporarily reorganizing the seed matrix columns during their creation.
[0096] For example, the five channel matrix described above generates signals for channels listed in order such as L, C, R, LS and RS. The anticipated symmetries of the speaker positioning for this particular set of channels can be used more easily by rearranging the channels in order according to the azimuth location of their respective acoustic transducer. A suitable order is LS, L, C, R and RS, which positions the central channel C in the middle of the set.
[0097] Using this order, a set of candidate vectors can be constructed that has appropriate symmetry. An example is shown in Table I, where each vector is shown on a respective line in the table. The transposition of these vectors will be used to define the columns of the P seed matrix.

Table I
[0098] Each of the lines in the table has either odd or even symmetry with respect to the column for the central channel. A total of K vectors are chosen from the table, transposed and used to form an initial matrix P '. For example, if K = 3 and the vectors are chosen for the functions FE1, FE2 and FO1, then the initial matrix P 'is:

[0099] The order of the vector elements is then changed to conform to the channel order of the desired seed matrix P. This produces the following matrix:

[00100] If this seed matrix P is used with the basic matrix B shown in expression 4, the interim matrix Q obtained by the process described above is:

[00101] The second interim matrix R formed from this matrix Q is:

[00102] The augmented matrix A obtained from this interim matrix D:
E. Deployment
[00103] Devices that incorporate various aspects of the present invention can be implanted in a variety of ways including software for execution by a computer or some other device that includes more specialized components such as coupled digital signal processor (DSP) circuitry to components similar to those found on a general purpose computer. Figure 6 is a schematic block diagram of a device 70 that can be used to implement aspects of the present invention. Processor 72 provides computing resources. RAM 73 is a system random access memory (RAM) used by processor 72 for processing. ROM 74 represents some form of persistent storage such as read-only memory (ROM) for storing programs needed to operate device 70 and possibly to perform various aspects of the present invention. The I / O control 75 represents the set of interface circuits for receiving and transmitting signals via the communication signal paths 19, 59. In the mode shown, all the main system components are connected to the bus 71, which can represent more a physical or logical bus; however, a bus architecture is not necessary to implement the present invention.
[00104] In the modalities implanted by a general purpose computer system, additional components can be included to interface with devices such as a keyboard or mouse and a display and to control a storage device that has a storage medium such as tape or magnetic disk or an optical medium. The storage medium can be used to record instruction programs for operating systems, utilities and applications and can include programs that implement various aspects of the present invention.
[00105] The functions necessary to practice various aspects of the present invention can be implemented in a wide variety of ways including discrete logic components, integrated circuits, one or more ASICs and / or program controlled processors. The manner in which these components are implanted is not important for the present invention.
[00106] The software deployments of the present invention can be driven by a variety of machine-readable media such as baseband or modulated communication paths across the spectrum including from supersonic to ultraviolet frequencies, or storage media that conducts information with use essentially any recording technology including tape, cards or magnetic disc, cards or optical disc and markings detectable on media including paper.
权利要求:
Claims (10)
[0001]
1. Method to derive M audio output signals from N input audio signals to present a diffuse sound field, where M is greater than N and greater than two, the method characterized by the fact that comprising the steps of: receiving the N input audio signals, where the N input audio signals represent a diffuse sound field; derive K intermediate audio signals from the N input audio signals so that each intermediate audio signal is psychoacoustically de-correlated with the N input audio signals and, if K is greater than one, be psychoacoustically de-correlated with all the others intermediate audio signals, where K is greater than or equal to one and is less than or equal to MN; and mix the N input audio signals and the K intermediate audio signals to derive the M audio output signals, where the mixing is performed according to a system of linear equations with coefficients of a matrix that specifies a set of N + K vectors in an M-dimensional space, and where each of the N vectors in the set specifies mixing coefficients applied to a different N input signal, where each of the K vectors in the set specifies applied mixing coefficients to a different one among the K intermediate signals, where at least K of the N + K vectors are substantially orthogonal to all other vectors in the set.
[0002]
2. Method, according to claim 1, characterized by the fact that each of the K intermediate signals derives by delay from one of the N input audio signals.
[0003]
3. Method, according to claim 1, characterized by the fact that a respective intermediate signal is derived by a method that comprises: filtering one of the N input audio signals according to a first impulse response in a first sub -frequency band to obtain a first sub-band signal with a frequency dependent change in phase that has a bimodal frequency distribution with peaks equal to ninety positive and negative degrees and according to a second impulse response in a second frequency subband to obtain a second subband signal with a frequency dependent delay, where: the second pulse response is not the same as the first pulse response, the second frequency subband includes frequencies that are higher than the frequencies included in the first frequency subband, and the first frequency subband includes frequencies that are lower than the frequencies included in the second frequency subband ia; and deriving the respective intermediate signal from a combination of the first subband signal and the second subband signal.
[0004]
Method according to any one of claims 1 to 3, characterized by the fact that N is greater than one.
[0005]
5. Method according to any one of claims 1 to 4, characterized by the fact that: the matrix comprises a first coefficient submatrix for N vectors with coefficients that are scaled by a first β scale factor and a second coefficient submatrix for K vectors that are scaled by one or more second scale factors a; the N input audio signals are mixed according to a system of linear equations with the coefficients of the first submatrix as scaled by the first scale factor; the K intermediate audio signals are mixed according to a system of linear equations with the coefficients of the second submatrix as scaled by the one or more second scale factor.
[0006]
6. Method, according to claim 5, characterized by the fact that: the second submatrix of coefficients for K vectors is scaled by a second scale factor a; and the first scale factor and the second scale factor are chosen so that the Frobenius norm of the matrix is within 10% of the Frobenius norm of the first submatrix not scaled by the first scale factor β; and
[0007]
7. Method for obtaining a coefficient matrix for a system of linear equations for use in mixing N input audio signals that represent a diffuse sound field to derive M audio output signals for presenting a diffuse sound field, the method characterized by the fact that it comprises the steps of: obtaining a first matrix that has coefficients that specify a set of N first vectors in an M-dimensional space; derive a set of K second vectors in M-dimensional space, with each second vector being substantially orthogonal to each first vector and, if K is greater than one, to all other second vectors; obtain a second matrix that has coefficients that specify the set of K second vectors; and concatenating the first matrix with the second matrix to obtain an intermediate matrix that has coefficients that specify a union of the set of N first vectors and the set of K second vectors, where the coefficients of the signal processing matrix are the system coefficients of linear equations.
[0008]
8. Method, according to claim 7, characterized by the fact that it comprises staggering the coefficients of the intermediate matrix so that a Frobenius norm of the staggered inter-daily matrix is within 10% of a Frobenius norm of the first matrix.
[0009]
9. Apparatus, characterized by the fact that it comprises: one or more input terminals to receive input signals; memory; a storage medium for recording and carrying out the method as defined in any one of claims 1 to 8; set of processing circuits coupled to one or more input terminals, memory, storage medium and one or more output terminals to carry out the method; and one or more output terminals for transmitting output signals.
[0010]
10. Storage medium for recording a method characterized by the fact that it is executable by a device to perform the method as defined in any one of claims 1 to 8.
类似技术:
公开号 | 公开日 | 专利标题
BR112012018291B1|2020-10-27|METHOD FOR DERIVING AUDIO OUTPUT SIGNS, METHOD FOR OBTAINING A MATRIX OF COEFFICIENTS FOR A LINEAR EQUATION SYSTEM, APPLIANCE AND STORAGE MEDIA TO RECORD A METHOD
US11272311B2|2022-03-08|Methods and systems for designing and applying numerically optimized binaural room impulse responses
TWI451772B|2014-09-01|Rendering center channel audio
ES2387248T3|2012-09-19|Apparatus and procedure for generating a multi-channel output signal
CN102165798B|2013-07-17|Binaural filters for monophonic compatibility and loudspeaker compatibility
JP6818841B2|2021-01-20|Generation of binaural audio in response to multi-channel audio using at least one feedback delay network
JP2018509864A|2018-04-05|Reverberation generation for headphone virtualization
EP1761110A1|2007-03-07|Method to generate multi-channel audio signals from stereo signals
US8873762B2|2014-10-28|System and method for efficient sound production using directional enhancement
KR101779731B1|2017-09-18|Adaptive diffuse signal generation in an upmixer
同族专利:
公开号 | 公开日
AR081098A1|2012-06-13|
ES2588222T3|2016-10-31|
TWI444989B|2014-07-11|
WO2011090834A1|2011-07-28|
US9269360B2|2016-02-23|
BR112012018291A2|2018-06-05|
EP2526547B1|2016-07-06|
KR101380167B1|2014-04-02|
KR20120102127A|2012-09-17|
CN102714039A|2012-10-03|
RU2012134496A|2014-02-27|
US20120321105A1|2012-12-20|
JP2013517687A|2013-05-16|
JP5612125B2|2014-10-22|
EP2526547A1|2012-11-28|
MX2012008403A|2012-08-15|
TW201140561A|2011-11-16|
RU2519045C2|2014-06-10|
CN102714039B|2014-09-10|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

JP4434951B2|2002-08-07|2010-03-17|ドルビー・ラボラトリーズ・ライセンシング・コーポレーション|Spatial conversion of audio channels|
SE0202159D0|2001-07-10|2002-07-09|Coding Technologies Sweden Ab|Efficientand scalable parametric stereo coding for low bitrate applications|
DE10351793B4|2003-11-06|2006-01-12|Herbert Buchner|Adaptive filter device and method for processing an acoustic input signal|
SE0400998D0|2004-04-16|2004-04-16|Cooding Technologies Sweden Ab|Method for representing multi-channel audio signals|
JP4335752B2|2004-06-15|2009-09-30|三菱電機株式会社|Pseudo stereo signal generation apparatus and pseudo stereo signal generation program|
JP5118022B2|2005-05-26|2013-01-16|エルジーエレクトロニクスインコーポレイティド|Audio signal encoding / decoding method and encoding / decoding device|
WO2007010771A1|2005-07-15|2007-01-25|Matsushita Electric Industrial Co., Ltd.|Signal processing device|
US20070055510A1|2005-07-19|2007-03-08|Johannes Hilpert|Concept for bridging the gap between parametric multi-channel audio coding and matrixed-surround multi-channel coding|
RU2414741C2|2005-07-29|2011-03-20|ЭлДжи ЭЛЕКТРОНИКС ИНК.|Method of generating multichannel signal|
KR101218776B1|2006-01-11|2013-01-18|삼성전자주식회사|Method of generating multi-channel signal from down-mixed signal and computer-readable medium|
US8712061B2|2006-05-17|2014-04-29|Creative Technology Ltd|Phase-amplitude 3-D stereo encoder and decoder|
DE102006050068B4|2006-10-24|2010-11-11|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for generating an environmental signal from an audio signal, apparatus and method for deriving a multi-channel audio signal from an audio signal and computer program|
US8705757B1|2007-02-23|2014-04-22|Sony Computer Entertainment America, Inc.|Computationally efficient multi-resonator reverberation|
CN101809654B|2007-04-26|2013-08-07|杜比国际公司|Apparatus and method for synthesizing an output signal|
AT493731T|2007-06-08|2011-01-15|Dolby Lab Licensing Corp|HYBRID DISCHARGE OF SURROUND SOUND AUDIO CHANNELS THROUGH CONTROLLABLE COMBINATION OF AMBIENT AND MATRIX DECODED SIGNAL COMPONENTS|
KR101629862B1|2008-05-23|2016-06-24|코닌클리케 필립스 엔.브이.|A parametric stereo upmix apparatus, a parametric stereo decoder, a parametric stereo downmix apparatus, a parametric stereo encoder|JP2011199847A|2010-02-25|2011-10-06|Ricoh Co Ltd|Conference system and its conference system|
WO2013064957A1|2011-11-01|2013-05-10|Koninklijke Philips Electronics N.V.|Audio object encoding and decoding|
CN107071685B|2012-07-16|2020-02-14|杜比国际公司|Method and apparatus for rendering an audio soundfield representation for audio playback|
WO2014101242A1|2012-12-31|2014-07-03|华为技术有限公司|Method for reporting channel state information , user equipment and base station|
GB2509533B|2013-01-07|2017-08-16|Meridian Audio Ltd|Group delay correction in acoustic transducer systems|
TWI618051B|2013-02-14|2018-03-11|杜比實驗室特許公司|Audio signal processing method and apparatus for audio signal enhancement using estimated spatial parameters|
TWI618050B|2013-02-14|2018-03-11|杜比實驗室特許公司|Method and apparatus for signal decorrelation in an audio processing system|
JP6046274B2|2013-02-14|2016-12-14|ドルビー ラボラトリーズ ライセンシング コーポレイション|Method for controlling inter-channel coherence of an up-mixed audio signal|
WO2014126688A1|2013-02-14|2014-08-21|Dolby Laboratories Licensing Corporation|Methods for audio signal transient detection and decorrelation control|
CN110085240A|2013-05-24|2019-08-02|杜比国际公司|The high efficient coding of audio scene including audio object|
CN105229732B|2013-05-24|2018-09-04|杜比国际公司|The high efficient coding of audio scene including audio object|
TWI557724B|2013-09-27|2016-11-11|杜比實驗室特許公司|A method for encoding an n-channel audio program, a method for recovery of m channels of an n-channel audio program, an audio encoder configured to encode an n-channel audio program and a decoder configured to implement recovery of an n-channel audio pro|
KR101779731B1|2013-10-03|2017-09-18|돌비 레버러토리즈 라이쎈싱 코오포레이션|Adaptive diffuse signal generation in an upmixer|
WO2015150384A1|2014-04-01|2015-10-08|Dolby International Ab|Efficient coding of audio scenes comprising audio objects|
CN105336332A|2014-07-17|2016-02-17|杜比实验室特许公司|Decomposed audio signals|
CN104484559B|2014-12-09|2017-07-04|大连楼兰科技股份有限公司|The analytic method and its resolver of data signal|
CN105992120B|2015-02-09|2019-12-31|杜比实验室特许公司|Upmixing of audio signals|
EP3266021B1|2015-03-03|2019-05-08|Dolby Laboratories Licensing Corporation|Enhancement of spatial audio signals by modulated decorrelation|
JP6202076B2|2015-12-07|2017-09-27|オンキヨー株式会社|Audio processing device|
EP3382704A1|2017-03-31|2018-10-03|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for determining a predetermined characteristic related to a spectral enhancement processing of an audio signal|
法律状态:
2019-01-08| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|
2019-09-10| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|
2020-04-14| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|
2020-10-27| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 07/01/2011, OBSERVADAS AS CONDICOES LEGAIS. |
优先权:
申请号 | 申请日 | 专利标题
US29769910P| true| 2010-01-22|2010-01-22|
US61/297,699|2010-01-22|
PCT/US2011/020561|WO2011090834A1|2010-01-22|2011-01-07|Using multichannel decorrelation for improved multichannel upmixing|
[返回顶部]